Background

Previously, I considered whether a gamma or an induced dirichlet prior on cut points in an ordinal logistic model worked better for the kind of data I’m considering. Neither was very promising, but because of problems with convergence, etc. in models with the gamma prior, I’m testing group effects with the induced dirichlet.

In this model, the state of flowering depends on how many forcing units have accumulated.

There are three possible states: 1-not yet flowering, 2- flowering, 3- done flowering.

\(x\) is accumulated forcing: when flowering was observed, how much warmth had the population/tree been exposed to since January 1? \(x\) is always positive (and always increases monotonically through time - though time is abstracted out of this model).

\(\beta\) describes how fast the transition occurs - small \(\beta\)s make the transition from between states occur over a wider range of \(x\)’s. (Translated from forcing units to days, this answers a question like “does the population transition between states over 1 day, 3 days, a week?” We work in forcing unit space because the trees respond to temperature, not time - and no spring heats up exactly the same so dates are kind of useless for prediction.) \(\beta\) is always positive.

In the case without groups, an ordinal logistic model with an induced dirichlet prior on cutpoints can struggle to recapture \(\beta\) and cut points \(c\), but it is pretty good at capturing the relationship between \(\beta\) and \(c\): \(h = \frac{c}{\beta}\), which is the point at which half the trees in a population have transitioned or the point at which an individual tree is 50% likely to have transitioned.

In the previous analysis, no groups were considered. In reality, I want to know if things like site, provenance, and clone affect \(h\).

Goals

  1. Generate data with an ordinal logistic model with linear model structure \(\beta x + \alpha\)
  2. Fit model with Stan
  3. Determine whether \(\beta\), \(\beta_{site_i}\), \(\alpha\), \(c\), or \(h\) can be returned.

I’ll simulate data for 10 groups with N=100 observations for each group. Each group’s \(h\) are shifted by \(h_{mod}\), which is drawn from a \(\mathcal{N}(0,\sigma)\). Data is simulated for two transition speeds, \(\beta=1\) and \(\beta=2\).

First we’ll do this with null effects (\(\sigma=0\)), then with a relatively large range of effects (= 1).

Each model will be run 5 times to see if the results are stable.

Can null group effects be detected?

Simulate data from an ordinal logistic model where there are 10 groups, but none of the groups have an effect.

The Stan model for data simulation is:

## // Simulate ordinal logistic data with a covariate and group effect
## 
## data {
##   int<lower=1> N;
##   int<lower=2> K;
##   int<lower=2> G;
## 
##   positive_ordered[K-1] c; //cutpoints
##   real beta; //slope
## 
##   vector[N] x; //covariate
## 
##   int group[G]; // group ids
##   vector[G] alpha_g; //group effects
## 
## }
## 
## generated quantities {
##   int<lower=1, upper=K> y[N,G];                     // Simulated ordinals
##   matrix[N,G] gamma; //simulated linear model
## 
##   for (n in 1:N) {
##     for (g in 1:G) {
##       gamma[n,g] = (x[n] * beta) + alpha_g[group[g]]; // Latent effect
##       y[n,g] = ordered_logistic_rng(gamma[n,g], c); // ordinals
##     }
##   }
## }

Plots of simulated data for each group should look roughly identical.

## 
## SAMPLING FOR MODEL 'covar_group_alpha_sim' NOW (CHAIN 1).
## Chain 1: Iteration: 1 / 1 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0 seconds (Warm-up)
## Chain 1:                0.00029 seconds (Sampling)
## Chain 1:                0.00029 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'covar_group_alpha_sim' NOW (CHAIN 1).
## Chain 1: Iteration: 1 / 1 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0 seconds (Warm-up)
## Chain 1:                0.000314 seconds (Sampling)
## Chain 1:                0.000314 seconds (Total)
## Chain 1:
Simulated data in each group. Since groups have no effect in test, all groups should look more or less identical.

Simulated data in each group. Since groups have no effect in test, all groups should look more or less identical.

Estimate parameters

Next we’ll check whether models can accurately recapture parameters. Based on findings from the no-group investigation, we’ll create models with an exponential prior on \(\beta\) with a rate of 1, 2, or 3 and the anchor for the dirichlet inducing the priors on cutpoints at 10 and 20.

Prior parameters. modelids are unique identifiers for each combination of prior parameters and simulated datasets, in order from fast transition to medium transition dataset
modelids anchor beta_rate
1, 2 10 1
3, 4 10 2
5, 6 10 3
7, 8 20 1
9, 10 20 2
11, 12 20 3

I’m going to do 5 runs of each of the 12 models in Stan.

The Stan program fitting these models is

## //Betancourt's ordinal logistic with a dirichlet prior
## functions {
##     real induced_dirichlet_lpdf(vector c, vector alpha, real phi) {
##         int K = num_elements(c) + 1;
##         vector[K - 1] sigma = inv_logit(phi - c);
##         vector[K] p;
##         matrix[K, K] J = rep_matrix(0, K, K);
## 
##         // Induced ordinal probabilities
##         p[1] = 1 - sigma[1];
##         for (k in 2:(K - 1))
##             p[k] = sigma[k - 1] - sigma[k];
##         p[K] = sigma[K - 1];
## 
##         // Baseline column of Jacobian
##         for (k in 1:K) J[k, 1] = 1;
## 
##         // Diagonal entries of Jacobian
##         for (k in 2:K) {
##             real rho = sigma[k - 1] * (1 - sigma[k - 1]);
##             J[k, k] = - rho;
##             J[k - 1, k] = rho;
##         }
## 
##         return   dirichlet_lpdf(p | alpha)
##         + log_determinant(J);
##     }
## }
## 
## data {
##     int<lower=1> N;             // Number of observations
##     int<lower=2> K;             // Number of ordinal categories
##     int<lower=2> G;             // Number of groups
## 
##     int<lower=1, upper=K> y[N]; // Observed ordinals
##     vector[N] x;                  // Covariate
## 
##     int<lower=1> group[N]; //Groups
## }
## 
## parameters {
##     positive_ordered[K - 1] c; // (Internal) cut points
##     real beta; // population level effect
## 
##     vector[G] alpha_g;
## 
## }
## 
## // transformed parameters {
## //     real sigma_group = 1; // be real explicit about the prior
## // }
## 
## model {
##     vector[N] phi;
## 
## 
##     // Prior model
##     beta ~ exponential(3);
##     c ~ induced_dirichlet(rep_vector(1, K), 0);
##     alpha_g ~ normal(0, 1);
## 
## 
##     // Observational model
## 
##   for (i in 1:N ) {
##     phi[i] = (beta * x[i]) + alpha_g[group[i]];
##     y[i] ~ ordered_logistic(phi[i], c);
##     }
## 
## }
## 
## 
## // generated quantities {
## //     vector[N] gamma_ppc;
## //     int<lower=1, upper=K> y_ppc[N];
## //
## //     for (n in 1:N) {
## //         gamma_ppc[n] = beta*x[n];
## //         y_ppc[n] = ordered_logistic_rng(gamma_ppc[n], c);
## //     }
## // }

Models have very serious problems (divergences, large rhats, low effective sample sizes, etc.) if the variance on group effects has a default (uniform) or \(\mathrm{Exponential}(1)\) distribution (not shown).

Performance is better when prior on \(\alpha\) is an explicit \(\mathcal{N}(0,1)\).

Ideally I’d want to fit \(\sigma\) since I actually don’t know it in advance. It’s a problem that the model can’t fit it.

Models with obvious problems.
modelid bad_proportion bad_count divergences_mean bad_rhats_mean bad_neff_mean
12 0.2 1 0 2 0
8 0.2 1 0 1 0

A few of the models have rhats that are too large in some runs. The problems occur with data where the transition speed is moderate (\(\beta=1\)), the prior on \(\beta\) is loose or tight, and the anchor is large.

\(\beta\) and \(c\) are always underestimated. \(h\) is estimated ok when the true transition speed is rapid, and poorly when it’s slower. Effects (\(\alpha_g\)) are generally centered on 0 and are quite wide like the prior.

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Parameter estimates with true values as vertical lines

Recapture rate

Models for data with fast transitions capture more true parameters in their 50% HPDIs than data with medium transitions, but medium transitions capture more true parameters in their 90% HPDIs. Medium speed transitions, however, tend to have fitting issues unless the prior on \(\beta\) is just right. No model captures all parameters. There isn’t a dramatic difference between priors, though a tight \(\beta\) prior and small anchor depresses the number of parameters estimated correctly for fast transition datasets.

Which parameters are easier or harder to recapture?

As one might have expected from the results in a model with less structure in the data, \(\beta\) and \(c\) aren’t returned well. They are never returned for fast transition datasets and they are only returned in the 90% HPDI for medium transition datasets.)

The transition points and effects \(\alpha\) fare better. The true value, 0, is almost always in the 90% HPDI and, for many groups, is in the 50% HPDI. But the 90% HPDI is relatively large - there’s a fair amount of uncertainty there.

Proportion of models where parameter returned in 50% (light) or 90% (dark) HPDI. Columns are transition speed and prior on beta, rows are anchors for cutpoint prior.

Proportion of models where parameter returned in 50% (light) or 90% (dark) HPDI. Columns are transition speed and prior on beta, rows are anchors for cutpoint prior.

Can group effects in a range of sizes be detected?

While I am hoping group effects will largely be 0 or very close to it, I need to know if they are not. So, now that we know the model can detect null effects, I need to know what size effects it can detect. To that end, I will simulate data from an ordinal logistic model as above, but this time with effects on the intercept \(\alpha\) distributed \(\mathcal{N}(0,1)\)

## 
## SAMPLING FOR MODEL 'covar_group_alpha_sim' NOW (CHAIN 1).
## Chain 1: Iteration: 1 / 1 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0 seconds (Warm-up)
## Chain 1:                0.000296 seconds (Sampling)
## Chain 1:                0.000296 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'covar_group_alpha_sim' NOW (CHAIN 1).
## Chain 1: Iteration: 1 / 1 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0 seconds (Warm-up)
## Chain 1:                0.000296 seconds (Sampling)
## Chain 1:                0.000296 seconds (Total)
## Chain 1:
Simulated data in each group.

Simulated data in each group.

Since groups have different effects, they should look different in this plot.

Estimate parameters

Does the model accurately recapture parameters? I’ll use the same priors as above, and the same Stan model.

Again, models for data with medium speed transitions struggled - generally rhat too large. Problems don’t occur with models for fast transition speed data. When models have an anchor of 10 and relatively constrained \(\beta\) priors (rate 2 or 3) or an anchor of 20 and a tight \(\beta\) prior (3), models for medium transition speed don’t run into convergence, etc. problems.

Alpha data, alpha model
modelid bad_proportion bad_count divergences_mean bad_rhats_mean bad_neff_mean
10 0.2 1 0 10 0
2 0.4 2 0 5 0
8 0.2 1 0 3 0

\(\beta\) and \(c\) are always underestimated, and quite dramatically when underlying transition speed is fast. \(h\) is underestimated as well, though not quite as badly. When the true \(\alpha\) is near 0, it is estimated worse.

Recapture rate

Most parameters are not captured in the 50% HPDI. Parameters for medium transition speed data are easier to recapture.

As one might have expected from the results in a model with less structure in the data, \(\beta\) and \(c\) aren’t returned ever in the 50% HPDI. For medium transition speed data, they are returned in the 90% HPDI, but never for fast transition speed data.

The transition points and effects \(\alpha\) fare better. The true value for \(h\) and \(\alpha\) are generally returned in the 90% HPDI but very rarely in the 50%.

tl;dr

Null groups are recaptured when all effects are null, but probably with too much uncertainty to be useful in my situation. Null groups are not recaptured when other groups are not 0. Group effects and transitions are not recaptured very well -\(h\) is consistently underestimated. And the actual params \(\beta\) and \(c\) are not trustworthy.